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(57) Abstract 

The invention relates to an optical character recognition system in which video data defining an image is fed to a bit map 
(7) for subsequent processing. The image in the bit map is segmented and classified using an N-tuple processor. The segmenta- 
tion and classification are synchronous-state machine implemented. 
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IMAGE RECOGNITION 
5 The invention relates to methods and apparatus for 

recognising two dimensional images , such as text 
characters, represented in binary form as a bit map of 
pixels. 

Various character recognition systems have been 
10 developed and proposed and these systems generally fall 
into two types: 

1. Template (mask) matching or Matrix matching: where 
the image of the character is compared with a set of 
stored prototype images to achieve a match and recognise 

15 the character. The technique is constrained by the 
amount of computer memory required to store the different 
fonts, it requires the character font to be known to the 
product, it requires well-defined characters, it does not 
"learn from its mistakes. 

20 Where a good match cannot be expected, the product 

costs increase with: 

a. pre-processing to remove distortions. 

b. post-processing to assess the degrees of match 
to the prototype templates 

25 2. Topological (Topographical) analysis or Shape 
(feature) analysis: where the shape and features of a 
character image are examined in order that an algorithmic 
match may be attempted. Such a technique has a high 
degree of font independence and it has a learning 

30 capability. Problems exist with poorly defined, 
distorted or broken characters (images) such as are met 
with in every-day print since these distortions affect 
the features by which the character is to be recognised. 
Software means are predominately used to perform 

3 5 topological analysis. Thus the recognition speeds tend 
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to be low, in order to restrain the product costs; since 
the recognition speed is dependent on the execution times 
of the recognition computer system the faster the 
recognition speed, the more powerful the computer system 
(that is required), the greater the product costs. 

Techniques have been developed based on so-called 
N-tuple classifiers which were originally described in a 
paper entitled "Pattern Recognition and Readina bv 
Machine" by Bledsoe and Browning, 1959 Proceedings of 
Eastern Joint Computer Conference, pages 225-232 and 
which are also described in "Guide to pattern recognition 
using random-access memories" by Aleksander and Stonham, 
1979 . Computers and Digital Techniques Vol. 2, No. l' 
pages 29 -40. The N-tuple method is essentially a means 
15 comparing information presented to a system with 
, information already "learnt" by the system, so that the 
system can then make "most like" decisions. This 
methodology has an ability to cope with the recognition 
of patterns and shapes including multi-font character 
20 recognition. It does not require the font (to be 
recognised) to be pre -determined, it does however require 
an adequate training, over a sufficient range of fonts 
and over a sufficient range of distortions within a font 
to be able to discriminate between characters, for those 
25 fonts which it is likely to be required to recognise. 
Examples of patent specifications illustrating these 
N-tuple techniques are GB-A-1296701 , GE-A-1431438, and 
GB-A-2112194.- These systems achieve improved 

recognition results over the previous types of pattern 
recognition systems but require either, very expensive 
but fast hardware based systems or, lower priced (but 
still expensive), slow software based systems. 

In accordance with one aspect of the present 
invention, image recognition apparatus comprises a first 
35 synchronous state machine for segmenting a number o* 
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images defined in bit map form into separate pixel 
groups; and a second synchronous state machine to which 
each pixel group is applied for classification. 

The intention, is that each pixel group found by the 
5 segmenting state machine will correspond to an image 
which can be classified. 

The inventors have realised the unique merits of the 
N-tuple method in coping with the variable - print quality 
of "real-world" documents. 
10 The inventors have further realised that the 

problems associated with the previous application, of the 
N-tuple method, could be overcome by the design approach 
These problems are the inter-relationships of the speed 
of recognition and the product costs, that is slow 
15 operation, high expense. 

An important feature of the invention is that the 
inventors have realised the unique and significant 
advantages in using a synchronous state machine approach 
to implement much of the core technology recognition 
2 0 function. This approach is particularly advantageous 
when utilising a technology development based on the 
N-tuple method of pattern recognition. 

The core technology recognition function comprises: 

(a) Segmentation. The process of breaking the 

25 scanned information into separate distinct 

images i.e. the process of shape extraction. 
The segmentation process is coupled with: 
Registration. The process of providing 
positional information, to register the 

30 relationship of the individual segmented 

images, thereby allowing the "recognised" 
characters to be assembled into a data-stream 
to an appropriate format. 

(b) Classification. The process of classifying the 
35 images into pre-defined classes. 
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The classification process includes the means 
handling cases where the classifier- 

(i) is unable to make a true decision, i. e a 
reject error; in this case the classifier 
should label the result accordingly 

(ii) makes a wrong decision, i.e. a substitution 
error; in this case the system has to recognise 

the error from other information such as the context 
Segmentation and classif icat-i n « j 

- —j - ........ _ ™.rr,i- ™ 

control of a system clock: Thus the ti™. , 

15 be avoided which can occur in „ P "" ltM " ~» 

occur m asynchronous machine 
associated with the processing of each staa. , aChlnSS ' 
. J y eacn stage, for example 

due to interrupt routines , polling routines hand-shaklna 
routines and the like. snaking 

A state machine approach for -t.*,-:,. • 

30 application o f the ^le 

allows the use of a hardware implementation Thi! T, 
a much hi ? her speed of i mag e reception ( tha„ can Z 
achreved by predominately a software approach, 
moderate product prices which are assocl."* „i" h 

25. software based products. he 

respective di 9 ital pi xe l J^Z^Z 
pixel group to an N-tuple classic ^ v, 
30 di rimi tors each ad^T V ^ 
class of a predetermined group of lessee, . 
cha t erise d in ^ eacb pixe / sen tea £ 

the drscrrmrnators in a Predetermined seauence; and rn 
that as soon as the output from a discriminator satisfies 
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a recognition condition, the presentation of the pixel 
group to the classifier is terminated. 

In accordance with a third aspect of the present 
invention, apparatus for recognising images represented 
by respective digital pixel groups comprises an N-tuple 
classifier including a number of discriminators each 
adapted to recognise a respective class of a 
predetermined group of classes and to which the pixel 
groups are presented, the apparatus being arranged to 
present each pixel group to the discriminators in a 
predetermined sequence; and recognition means for 
monitoring the output of the discriminators and for 
terminating the presentation of the pixel group to the 
classifier as soon as the output from a discriminator 
15 satisfies a recognition condition. 

For the first time, we have realised that it is 
possible to make operation of an N-tuple classifier 
interactive with the recognition process so that as soon 
as a character is sufficiently identified, further 
20 operation of the classifier is terminated. 

In one example, the method comprises comparing the 
output from each discriminator with a threshold, the 
recognition condition being satisfied when the threshold 
is exceeded. Typically, the situation is: 
25 (i) There will be some threshold ' A • above which, 
the image is identified (recognised) . 

(ii) There will be some threshold 'B' below which, 
the image is not immediately recognised. 

(iii) The band between 'A ' and »B' for which the image is 
recognised as belonging to a group of classes, for 
example lower case o e c, but further processing is 
required, to allow the particular image to be 
recognised. 

(iv) In the event that the discriminator response 
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.cores" are less than B this „ ould rcean ^ 
class lfl er has performed a complete operation, in 
that case the Ranking Order of scores is examined 

I sc^at dl " erenC£ betWee " the -i— 

Cl ! lnat0r ° UtpUt «e next discriminator 

outputs satisfy predetermined criteria, the, the 

charact " COnditi °" " i-. the 

character is recognised (as the highest score) 

In another example, each civoi 
« Presented to the ^inators \ n £^ ^ 
of occurrence of the classes represented by the 

rxtr- For exmpie ' wh6 « c^r 

text the D , ° rl9inatin9 fr °" « **U.h languaoe 
15 '"J . My ^tially be presented rb 

tor. representing the vowels (as bein, the 
commonest occurring letters in the English language, nd 
subsequently to other groups of classes vino 
successively decreasing frequencies of occurrence 
20 discrf - \ fUrth " *"*"«—»'. the discriminator or 

chosen in accordance with the location of the ni,J 
croup defining the images within the „t n 7 
previously detected images. Fo r example, in the cL 

The concept of interaction between the classifier 
and the recognition process is also ntlli.«. • 
30 according to a fourth aspect of ^IZITZ 
recognising images represented by respective \ 
Pixel groups, the method comprising presenting each pixe 
croup to an N-tuple classifer having a numL of 
iscriminators each adapted to recognise a Ce t <ve 
35 class of a predetermined group of classes charact Z ed 
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in that if none of the discriminator outputs satisfies a 
recognition condition but it is determined that the pixel 
group defines an image falling within a group of the 
classes, the method further comprises presenting a 
5 portxon of the pixel group to a subsidiary N -tuple 
classxf.er having a number of subsidiary discriminators 
each adapted to • recognise a respective portion of the 
group of classes. 

in the case of the English language, certain letters 

10 such as "o", "e" and 'V" h=„ = • •, 

similar forms and the 
c-assifier may not have been trainee sufficiently to 
distinguish between them. However, if the right-hand 
hair of each of those letters is compared, these are 
significantly different and thus by training a subsidiary 

15 classifier on the right-hand halves alone, these 
particular characters can be distinguished relative to 
each other. 

aooara't ""T^ WUh * flfth «F«* of the invention, 
apparatus for recognising images represented by 
20 respective digitial pixel groups comprises an K -tuple 
classifier having a number of discriminators each adapted 
to recognise a respective class of a predetermined group 
of classes and to which each pixel group is presented 
recognition means for monitoring the outputs of the 
25 discriminators.- and a subsidiary N -tuple classifier 
having a number of subsidiary discriminators each adapted 
to recognise a respective class of a predetermined croup 
of classes defining portions of a respective group of 
images, the recognition means being adapted to present a 
30 portion of a pixel group to the subsidiary classifier if 
it is determined that the discriminator outputs do not 
satisfy a recognition condition but the discriminator 
outputs define an image falling within the group of 
classes. * 
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in all these cases, the method preferably further 
comprises storing data defining the recognised class of 
tne image represented bv the n f VD i 

rne pixel group for which 
purpose the apparatus preferafcly P 
5 storage means. comprises 

pi^iToZ 1 i y ; in order to reauce processic ' •«* 

r in presentea = i ™ it ~ ly to groups of two 

=r more discriminators in the classifier end, where 
appropriate, the subsidiary olassifier 

10 T SPeCificatlon «» bit map descrihed win 

generaily have a data bus width of one bit. The (imao. 
processing tasKs reguire aooess to a memory systemT 

ZTT faShl ° n ^ be " USe tMS 15 * ^ Z Pixe 
addressable task, using dedicated' logic circuits it L 

-re efficiently organised with the memory bit ma 
organised with e data bus wrath of one bit. 

In this connection, in order to utili.. - ' . 
avail .hi. i utilise commercially 

available, low cost, memory devices and also to „. 
commercially available, lew cost «,< 

n. . ' cost ' microprocessor devices 

the inventors have recognised th.t .u M ' 



15 



25 



(1) 



The first port being a conventional memory access 
port designed to suit a partial • 

a Particular microprocessor 
bus, e.g. an eight bit wide data bus 

wtlhT/ P ° rt b6ing ° rganiSed t0 h ™ a bus 
width of one bit, with 

■"-w witn an addressina sv^-t-en, 

displacements in two axes, since it is desired L 
access individual pixels stored in a two dimensional 

It is important in both conventional N-tuole 
classification and the u„ pIe 
„i improvements to thSr 

35 classification described above, to be able to 

' ° De a£5le tc present to 
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the classrfier accurately segmented pixel g roups „ hich 
are K„ ow „ to define a single i,age, such as \ ZrJiH 
-n the case of printed text, segmentation is complicated 

5 sL / , lndiVidUal *™«s «• not alwav 

spaoed evenly from adjacent oharacters. Por example 
Proportionally spaced characters have a variable spring 
and certarn characters such as the letter pair "fo" M 
over lap. Ihes e and related problems are outlined „ 21 
IBM reference mentioned above. 
« in accordance with a sixth aspect of the present 

rnven Uon a method of segmenting images represented 
bit map form comprises scanning the bit map to determine 

the maxrmum extents of an image in first and ! 
^v-u-u^ , v j-irsr and second 

15 I" f I° n : , C " 0hS reC ° rdin ' '« -<* scan line in 

the dxrection the coordinates of the extreme pixels 

of the rmage in the second orthooonal direction 

20 pixel coordinates. Porously determined extreme 

in accordance with a seventh aspect of th» „ 

r :n 0 r aratus £or — »~ r: 

b.t man , 7 ^ f ° r '""ning the 

first and second orthogonal directions and for recordira 
for each scan line in the first direction the , " 
of the extreme pixels cf the • ooordrnates 
orthogonal direction, ^l^Z^l^ 

coordinates . F±Aei 

This method and apparatus is able to cope with 

overlapping and proportionally spaced images 

35 case of touching characters the pixel a J 

une pxxel grcup comprising 
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the image block, is divided into sub-blocks by an 
estimation of the character boundaries within the pixel 
block, (representing two or more character pixel croups)" 
for example by a knowledge of the character aspect ratio 
(obtained from an histogram analysis of the text) , each 
sub-block is then submitted separately to a 
classification process. 

Typically, in the case of a page of text, the 
scanning of the bit map is carried out in a series of 
horizontally spaced, vertical scan lines and this leads 
to the ability to compensate for skew from a knowledge of 
the line spacing or pitch deduced from a histogram 
analysis of the page of text. 

Preferably, the selecting step comprises scanning 
the bit map in a series of lines extending in a second 
orthogonal direction and spaced apart in the first 
orthogonal direction, each line having a • length 
corresponding to the distance between the respective 
extreme pixel coordinates. 

Some previous segmentation methods have involved 
locating a black pixel and then examining the immediate 
neighbours to that pixel and subsequently locating one of 
the adjacent pixels which is black and repeating the 
process. This leads to considerable duplication in that 
the same pixels will be examined several times and thus 
segmentation is a relatively slow process. 

In accordance with an eighth aspect of the present 
invention, a method of segmenting images represented in 
bit map form comprises 

30 a) scanning the bit map to detect a shape which 

may comprise an image; 

b) recording the location of those pixels in the 
bit map which define the detected shape; 
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and repeating the steps a) and b) to locate other 
images while ignoring in step a) each pixel whose 
location has been recorded in a step b) . 

In accordance with a ninth aspect of the present 
5 invention, apparatus for segmenting images represented in 
bit map form comprises scanning means for scanning the 
bit map to detect a shape which may comprise an image; 
and a memory for recording the location of those pixels 
in the bit map which define the detected shape, whereby 
10 the scanning means only responds to those pixels of the 
bit map whose locations have not been recorded in the 
memory . 

Typically, step b) comprises providing a- secord bit 
map coterminous with the bit map defining the images, and 
15 recording in the second bit map those pixels which have 
been found during the scanning step to correspond to a 
detected shape. 

Preferably, means are provided to ignore isolated 
black pixels, during the scanning process, as unwanted 

20 background noise. An isolated black pixel is one where 
all its (eight) neighbours are white pixels. 

The images with which the invention is 'concerned may 
include characters, such as text characters ( numbers and 
alpha characters) both arabic and ncn-arabic, and also 

25 other two dimensional predetermined patterns and shapes 
as for example obtained by robot manipulators carrying 
video cameras. 

The bit map defining the images may be generated in 
any conventional manner such as by means of a CCD array 
30 a video scan and subsequent digital processing and the 
like. 

Particularly advantageous methods and apparatus are 
constituted by combinations of the first to ninth aspects 
of the invention. 

35 
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An example of a character recognition system according 
to the invention win now be described with reference 
to the accompanying drawings, in which 
Figure 1 illustrates the overall system- 
5 Figure 2 illustrates the construction of the 
recognition system"; 

Figure 3 is a flowchart illustrating the operation of 
the computer control system- 

.0. ICtt " 3 b, ° Ck dlaSram '° f "- — ing 

Figure 5 illustrates the memory system- 

HZl 7 ^ bl ° Ck diagram ° f ^ ""-.-reh circuit 
Figure 7 illustrates the segmentation system- 

Figure 8A-8D illustrates the extraction pr^. 
Figure 9 illustrates the datum conditions for" an 
extracted shape; 

Figure 10A-10B illustrates the *eaUn» * * 
functions; ^ and Causing 

varUble'sIair " ^ a 

variable scaling system. 

Fig-e 12 Uiustrates an example of a scaUng t , M .. 

Figure 13A-13B IUustrate an an example of N-tupl»" 
mapping; r "* 

Figure u illustrates the classification syste B ; 
Figure 15 is a flow chart illustrating the operate of 
the classification system; 

Figure 16 iUustrate a combinational transition 
function. 
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Figure i illustrates the OCR (optical character 
recognition) system, by which indicia, existing in 
printed or written form, are captured as images and 
converted to data encoded to a computer industry 
standard . 

A video scanner (i) produces digitised video data 
representing the black or white pixel image data 
captured from a scanned page, as a line by line 
sequence of the indicia on that page, a scanner video 
interface (2) orders the video data into a form for 
subsequent data processing. The video data is sent 
to a recognition unit (3). the recognition unit output 
(4) being the indicia (character) data encoded to a 
suitable computer industry standard, such as ASCII 
(American Standard Code for Information Interchange). 

The scanner (l) may be any convenient form of 
commercial optical scanner, with appropriate product 
capabilities and facilities, that is a paper-handling 
capability (sheet-feed or flat-bed) at an appropriate 
image resolution, at an appropriate scan time for a 
page. Commercial scanners generally have an imag* 
resolution of 300 dots per inch (dpi), which is " 
adequate for most OCR purposes. Commercial scanners 
are available with scan times of less than 3 seconds 
for an ISO standard A4 page size, which allows far a 
high speed of character recognition circa 1000 
characters per second. The scanner video interface (2) 
may take one of several forms, serial or parallel, e.g. 
SCSI (Small Computer Systems Interface). " 

The scanner (l) may alternatively be constructed 
(as a page scanner) by utilising for example a full A* 
width CCD (charge coupled device) photo element array 
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coupled with A control system consisting of analogue 
image data capture, thresholding, digital conversion 
timing, scan controland interface circuits, this to' 
provide digitised video data, representing the Mr 
5 image data captured from the scanned page 

Figure 2 iUustrates the overal! construction of th« 
recognition unit (3,. The recognition unit system ' 
provides for the segmentation and classification 
functions, these functions being the processes of 
10 breafcmg the scanned images into separate distinct 

o7Z LVT , haraCter ' r " 1 """>« «- relationship 
of the individual (segmented, character images and 

assi y lng the character images into pre-defined 
character classes. 

15 ln-o ^ e Vlde ° da " fr °" the s «""er u, ls lnterfac9d 
" th T rKOS " itiM »> ^ a video interface ,s, 

this video interface (s, may he any of several *„o„„ 
forms, to suit the scanner's video interface , 2) . Th » 

~0 Te U int ° " imaSS ^-^ocessfng circu t" 

^0 « , which processes the video data into a RAM Irandom 

access memory, . in the for. of an image bit map (7, 

having a one bit wide data bus. 

with a Th Yr\ blt MP <? ' 0Per " eS in «"J»n«ion 
with a shadow bit map (8) , having pUsl lo „ tions 

- one to one correspondence with the image bit map (7, 
The shadow bit map ,8, ls used to avold ^ 
same P1 « ls several times, such duplicated processing 
occurs in some known segmentation ' schemes 

^0 rast * SC3n - SearCh Clrcuit < 9 ' Performs a vertical 

to WtT imaSS Mt < 7 >'«^ from the 

top left hand corner of the "page". This is to locate 

potential characters by searching for Mac* pixels " 

which have not been previously processed, i.e. to 

35 
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search for black pixels not in the shadow bit map (8). 
A synchronous state machine segmentation system (10) is 
used to extract the .character shape, associated with 
the found black pixel. 
5 The extracted character shape is fed into a normalise 
and randomise system function (11). The character 
shape is, by this function (11), normalised in size and 
converted into a random N- tuple form, which is then 
loaded into the buffered input of a synchronous state 
10 machine classification system (12). The classification 
system (12) identifies (classifies) each character, 
that is so presented. The identification of the 
character is fed into the computer control system (13) 
for post-processing. The computer control system (13) 
15 is also used to control certain aspects of the 

operation of the recognition unit (3). The computer 
control system (13) uses a commercial micro processor 
which is software controlled, the mode of operation is 
illustrated in Figure 3. 
20 Output of the character data to the Host system 

is via the system interface (14). 

The construction of the recognition unit (3) shown 
in Figure 2 will now be described in more detail with 
reference to Figures 3-16. 
25 The video data received from the scanner (l) is 

fed into the image preprocessing circuit (6), via the 
video interface (5), as initiated by the computer 
control system (13) (Steps 101 and 102 Fig\3>. 

The image pre-processing circuit (6) is shown in 
more detail in Figure 4. The video data is fed to 
control logic (15). Dependent on the OCR application 
and dependent on the resolution of the scanner (l), it 
may be desired to compress the video data, from say 600 
dpi to say 200 dpi. If data compression is necessary, 

35 
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the control logic (15) will feed the video data to the 
compression circuits:- horizontal compression (16) 
and vertical compression (17); for the example quoted 
both of these compression circuits would be 2:1." 
The data compression may be arranged to favour white, 
to enhance the character bit image separation. A 
circuit (18) electronically adds a white border (to 
define the boundary conditions to be used for 
subsequent scanning of the bit map) to the compressed 
(or otherwise) video data, at the time that this video 
data is being written into the image bit map (7); at 
the same time the shadow bit map (8) having a one bit 
wide data bus is cleared to white (step 103 Fig. 3). 
The process continues until the video data is 
completely received or until the image bit map is full, 
any unfilled portion of the image bit map being written 
to white. If the scanned video data exceeds the 
capacity of the image bit map (7), then the video data 
will require to be loaded [ into the recognition unit 
(3) ] in more than one data transfer operation, this 
will be controlled by the computer control system (13). 
At the completion of the setting-up the bit maps (Step 
104 Fig. 3) the computer control system (13) sets 
bitmap pointers to the scan "start" position, (Step 
25 105 Fig. 3). 

Commercially available memory devices, which could 
be used to constitute the image and shadow bit maps (7) 
and (8), have all been developed for convenient use 
with commercial microprocessors. Such memory devices 

30 have memories organised, with a data bus «idth which 
suits a particular micro-processor standard, the 
commonly encountered data widths being eight, sixteen 
or thirty-two bits, in the present application the 
memory system is associated with the processing of 

35 shapes, (image data) contained within that memory; 
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the data to be processed exists as black or white 
pixels (binary values of picture elements) which are 
stored in a two dimensional array of single pixel 
values and which is referred to as a bit map. The 
image data processing tasks require access to the bit 
map memory system in an incremental fashion, which is a 
pixel by pixel addressable task. This task is most 
efficiently organised with the bit map having a data 
width of one bit, since dedicated mixtures of 
combinational and sequential logic can then be used to 
achieve higher execution speeds, than could be achieved 
by a conventional microprocessor access to memory via a 
multi-bit data bus and software selection. 

Figure 5 illustrates in more detail the 
15 organisation of the memory system, which is designed to 
use commercially available (low cost) microprocessor 
and memory devices and which is constructed as a dual 
port system. The first port representing the • 
microprocessor interface (19) is a conventional memory 
access port, designed to suit a particular micro- 
processor data bus width, for example using an eight 
<bit wide data bus. The second port representing the 
image processing interface (20) is organised to have a 
one bit wide data bus, the addressing system of which 
25 provides for an incremental addressing, allowing 

positive and negative movements in the two axes of the 
memory plane. Such movements (in the two axes of the 
memory plane) are required in the segmentation process 
to be described. 
30 An access arbitration circuit (21) prevents a 

memory access occurring simultaneously from both the 
microprocessor and the image processing interfaces; the 
microprocessor interface (19) is one that can be forced 
to wait until the memory is ready for it. The access 
35 arbitration logic ensures that only one set of address 
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and data drivers are energised at any one time, thus 
preventing an undesirable "clash" of accesses. A 
signal M/S is used t9 enable one or other set of 
drivers, within the address multiplex and write 
multiplex circuits (22) and (23). The address multiplex 
circuit (22) functions as a selecting switch, according 
to which interface had the right of access to the 
memory at any particular time, it allows for the 
selection of the appropriate address required for any 
particular memory access. The write multiplex circuit 

(23) functions in a similar way, as the address 
multiplex circuit (22). 

The data organisation of the memory, is such that 
it it presented to the microprocessor in the familiar 
15 eight bits (byte) format, that is used by most micro 

computer memory systems. Each byte of the memory array 

(24) comprising four bits of the image bit map (7) and 
four bits of the shadow bit map (8). A l of 8 write 
decoder (25) is required to enable the image processing 
interface (20) , which handles one pixel at a time, to 
selectively write one bit within the eight bits of a 
byte. An equivalent function is required for read 
access be the memory from the image processing 
interface (20), in order for this interface to be able 
to select one particular bit from the byte, this is 
referred to as the 1 of 8 bit select circuit (26K 

The 8 bit datadriver (27) does not require the 
complexity that would be normally required if a bit 
selecting interface was to be provided to allow the 
image processing functions to write bits to memory on 
an individual basis. This simplification is achieved 
because , 

(a) the use of one bit wide memory devices allows the 
image processing functions to "read from" and "write 
35 to" memory on a single bit basis, i.e. there is no need 
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for a "read and rewrite" operation that would be 
necessary to perform a single bit write operation if 8 
bit wide memory devices were used, 

•(b) the image processing functions require a write 
5 only function of logic "l's" (representing black 
pixels) to the shadow memory. 

The address register (28), in conjunction with the 
offset address adder (29), controls the addressing of 
the memory array (24). The address register (28) holds 

10 coordinate pixel location information, corresponding to 
the coordinates of the pixels representing the scanned 
image video data. The offset address adder (29) is a 
binary parallel adder circuit , constructed from 
combinational logic; it is capable of handling an X 

15 or Y offset in positive and negative form in order to 
be able to address any pixel within the memory array 
(24). The addressed pixel can be either (a) left or 
right of the horizontal coordinate stored in the 
address register (29) or (b) above or below the 

20 vertical coordinate stored in the address register ( 29 ) 
Negative values (for left and above) are handled by 
treating the X and Y addresses as two 1 s complement 
binary numbers. The X and Y addresses presented from 
the image processing interface (20) need only to 

25 address a .limited ■ (256 x 256 pixels) area of the memory 
array (29), this is because these, addresses are used 
for the character segmentation, which requires only 
sufficient memory space for each character shape one at 
a time. 

30 The address register (28) performs various 

functions, independent on the operational aspects of 
the memory system: (a) In the setting-up of the image 
bit-map (7) the address register (28) counts through 
the XY coordinate addresses of the bit map to allow the 

35 storage of the pixel data, black or white, 
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corresponding to that coordinate address; for this 
operation the XY offset values are set to zero in the 
offset address adder. (29) . b) In the segmentation 
process the "found coordinate" of the character shape 
5 to be segmented, is entered into the address register 
(28) and the positive or negative movements in X and Y, 
necessary for the segmentation, are controlled by the 
offset adder (29). Coincident with the segmentation 
process the shadow bit map (8) is "read from" and 

10 "written to", the shadow bit map (8) being initially 
cleared to zero (all white) which is taken to be the 
not yet processed state of the scanned image video 
data. Writing to the shadow bit map (8) occurs as the 
image bit map (7) is conditionally scanned during the 

15 segmentation process; that is, to segment the character, 
the data in the image bit map (7) is read and an 
identical pixel is written to the shadow bit map (8), 
thus a copy of the character shape will appear in the 
shadow bit map (8) indicating that the character shape 

20 has been segmented. This conditional scanning of the 
image bit map (7) , to ignore character shapes 
previously encountered and segmented, can be simply 
achieved by scanning for image bit map pixels which 
have corresponding zeros (white pixels) in the shadow 

25 bit map (i.e. not previously seen pixels), this is 

implemented with a two input logic gate. An advantage 
of the shadow bit map (8) is that the image bitmap (7) 
is preserved to allow a re-examination of the pixel 
data, if so desired. 

30 A transceiver (bidirectional TRANS-mitter and 

re-CEIVE 30), present in the data path from the memory 
(24) to the microprocessor interface (19), isolates the 
memory from other data circuits connected to the micro 
processor. 

35 Referring to Figure 3, the next. step 106 is to 
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initiate the scan-search routine. The image bit map 
(7) is processed by the scan-search circuit (9) , shown 
in more detail in Figure 6. This process operates in 
conjunction with the segmentation system (10) (to be 
5 described) . 

The scan process applies a vertical raster scan to 
the image bit map (7) , starting at the top left hand 
corner (relative to the lines of text on the original 
scanned document) , this position is easily determined 

10 relative to the "white border" applied by the image 

pre-processing circuit (6). The raster scan is in the 
vertical direction downwards, moving left to right and 
continues until the scan encounters a non-processed 
"new" pixel, that is a black pixel which does not appear 

15 in the shadow bit map (8). The first "new" (black) 
pixel of a character so encountered, by the vertical 
scan, will be the uppermost- leftmost black pixel of 
that character, the XY position of which will be 
described as the "found coordinate" of that character. 

20 The vertical raster scan allows for the handling 

of skewed lines of text (on the original scanned 
document), since each character is found in sequence 
according to the spacings of the lines of text. Knowing 
the range of spacings of the lines of text and the 

25 vertical coordinates of each character then the text 
may be reconstructed on a line by line basis. The 
vertical spacings may be easily determined from the 
character positional information derived from the scan 
process . . 

30 As the vertical raster scan of the image bit map 

(7) proceeds, the shadow bit map (8) is scanned pixel 
by pixel at the same time. The logic state binary '0' 
(white) or binary * 1* (black) of the pixels in the 
shadow map, indicating whether or not the pixel in the 

35 image map currently being accessed has previously been 
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accessed, i.e. a binary 'O' in the shadow map would 
indicate a "new" pixel. 

The comparison-of the binary states of the pixels 
between the two bit maps is implemented with a 2 input 
logic gate circuit known as the new pixel selector 
circuit (31 ) . 

As a "new" pixel of a shape is encountered (found) 
the "found coordinate" of that (found) pixel is loaded 
into a found coordinate register (32) and a message is 

sent to the computer control system (13) (step 107, 
Fig. 3). The computer control system (13) immediately 
starts the segmentation process (to be described) to 
extract the character shape (step 108. Fig. 3). when 
the character shape is determined (step 109, Fig. 3) it 
15 is possible to continue the raster scan, since the 
shadow map is then completed with respect to the 
"found" character. The scan-search process continues 
until the end of the image bit map (7). is reached (step 
110, Fig. 3) . 

20 The use of the shadow bit map (8) provides the 

following benefits. 

fa) The image bit map (7) is unaltered. This is a 
particular benefit where a re-examination of the 
image data may be required, such a re-examination 

25 may be achieved by either (a) re-scanning the jL mage 

bit map (7) as a whole, or (b) by re-scanning 
selected areas by clearing the appropriate areas of 
the shadow bit map (81 back to zero (white) to allow 
for the pattern (s) to be found again. 

30 (b) Patterns within the image bit map (7) can be of 

unknown quantity, location and size. The shadow bit 
map (8) ensures that previously processed groups of 
pixels corresponding to the patterns already 
extracted are not reprocessed. 

35 
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The segmentation process is initiated, at step 108 
Fig. 3, by the computer control system (13). As 
previously ment ioned - the segmentation system (10) uses 
a synchronous state machine (33) (Figure 7) , where 
5 combinational transition functions (36) (Fig. 7) are 

used to define the conditions and sequence of the state 
machine. 

A synchronous state machine is one where every 
stage of the machine is stepped-on on simultaneously 
10 under the control of a system clock. Thus avoiding the 
time penalties which can occur in asynchronous machines 
associated with the processing of each stage, e.g. 
interrupt routines, polling routines, hand-shaking 
routines and the like. 
15 Figure 16 illustrates the use of a combinational 

transition function <3A) , which allows for conditional 
decisions to be made at every step and acts as a 
combinational logic array, to set the conditions and 
hence decide the "next state" out of the state 
20 register. The function inputs are "conditional" and 

"feedback"; the outputs are "control" and "next state" 
feedback. The total number of "states" defines the 
number of bits in the "next state" feedback path. The 
combinational transition functions (34) may reside 
25 either (a) in a non-volatile memory such as PROM 

(programmable read only memory), PAL (programmable 
array logic) etc. or (b) in a volatile memory RAM 
(random access memory), which is initialised by 
software on power-up of the machine. 

The use of combinational transition functions is 
particularly advantageous for this application, due to 
their ease of implementation and modification, as 
compared to a "logic gate" implementation for the 
segmentation system. 
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The segmentation system (i 0 ) is shown in more detail in 
Figure 7. The extraction of the character shapp is 
achieved by the state machine (33) operating in 
conjunction with the image bit map (7) and shadow bit 
map (8). The process starts with the "found" pixe i of 
hat character shape, the initial condition bein w h 

Ioo^i: a g : e blt " 9ddreSS — - - «- found 

and JT ^^^^ USSd t0 the character shap- 

and determine its boundary conditions will be explained 
in connection with the letter pair "fo" shown in " 

n^the figure this pair of characters are show 
overlapp.ng, ln order to iUustrate ^ ^ 

segmentxng copes with overlapping character. Th* 
character overlap situation can be seen mosi 'clea'rly 
from Fxgure SB. where enclosing rectangles (defined to 

;:;; ey inciude each ch — > « m^t* and 

xt wx U be seen that each rectangle includes a part of 
the other character. The technic used to define the 

extent of the character is to D~rfr^™ • " 

„ Perform an iterative 

search for the outer ~de<* o-f +-k~ „v 

~. . „ u _ ~ age of the character, i.e. to 

find the boundary between the black pixels of th* 

aTthTti 7 MMte PiXSlS S ™ di ^. Staging 

at the black oxxel corresponding . to the found 

coordinate, the search proceeds around the outside of 

the boundary and finishes when the start pix ~l (i " . 

the found coordinate, has been returned to. Whilst'thi, 

search 1S occurring two measurements are taken, (a) for 

the size and (b) for the profile of the shape. The 

first measurement uses a system of peak detecting 

registers, described as excursion registers (35, to 

record the maximum horizontal (right-most) and vertical 

(topmost and bottom-most) extents of the shape no^ 

that the left-most extent corresponds to the »' value" 
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(of the Y axis) of the found coordinate. Thus the final 
values within the excursion registers (35) represent 
the size of the enclosing rectangle for the character 
shape. The second measurement uses a pair of random 
access memories (36) and (37) to record the left most 
and right-most horizontal pixel coordinates for every 
line (one pixel width) of the shape addressed by th- 
vertical coordinate. The left and right pixels ar~ 
indicated by 'L' and ' R ' respectively in Figure 8C and 
represent the left and right profiles of the character 
shape. The character may now be extracted from th~ bit 
map memory by performing a raster scan (of the bit map 
memory) for the enclosing rectangle in the direction 
left to right, moving downward, allowing only those 
pixels whose coordinates are within the range enclo^d 
by the left and right profiles of the character. This 
is achieved by passing the coordinate values of th- 
right and left profiles and the coordinates of the" 
enclosing rectangle to an extract control circuit ne) 
This scan will have the effect of removing any 
intrusions (within the enclosing rectangle) du» to 
overlapping characters. The resultant extracted shap- 
wall have the form shown in Figure 8D which show, th- 
required result of removing the intruding portions of 
25 the neighbouring letter "o". 

The "aligned coordinate" is determined as the 
topmost and left-most coordinate of the enclosing 
rectangle for that character as illustrated in Figur- 
9. The "aligned coordinate" is loaded into,the aligned 
coordinate register (39). 

The extracted shape is then fed into the normal is- 
and randomise system function (u> at the same" time a 
message is sent to the computer control system (iv 
(step 109. Fig. 3) this message contains the size limit* 
for the extracted shape and the "aligned coordinate" 
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The aligned coordinate provides a datum for the 
character position and is used to reassemble 
(re-compose) the text and the page (of recognised 
characters) . 

5 The computer control system (13) assesses th- 

enclosing rectangle for any of the following" 
conditions: 

(a) Too small . 

(b) Too large. 

« (c) AS pec, t ratio (height to width) incorrect 

for a single character. 
If O) or (b) condition applies, the classification 
operation is aborted (step u 2 , Fig. 3 ) and the pixel 
group comprising the extracted character is classed as 
unidentifiable, i.e. an unrecognised character If (c) 
condition applies, the unrecognised pixel group bloc, 
xs divided into a series of sub-blocks, by estimation 
of the character boundaries within the pixel group 
block, each sub-block is submitted separately to th- 
classifier (step 113 Fig.3). 

The extracted (segmented) character shape ls 
required to be "normalised" to a standard "enclosing 
rectangle" size, e.g. 32 x 32 pixels, prior to 
classification. Since the extracted character shape may 
be any size (in pixels) the normalisation can be • 
achieved by initially scaling downwards in area by say 

16:1, 64:i ratios, so that the scaled shape size 
xs less than the required normalisation size and th-n 
usxng a look-up table approach to achieve the required 
normalisation result. Figure 10A illustrates th- 
technique. The initial scaling (downwards) is achieve* 
by the fixed scaling system (40) and the size 
"normalisation" by the variable scaling system (41, 

An example of a variable scaling system (41) i« 
shown in more detail in Figure U. The system has 
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horizontal and vertical counters (42) and (A3), driven 
by a clock which has a frequency chosen to suit the 
cycle time of the memories. The horizontal and 
vertical counters (42) , (43) are a pair of counters, 
which count from zero to full house one's during the 
scaling operation. Horizontal and vertical size 
registers (44) and (45) are provided, each comprising a 
5 bit register whose contents do not change during the 
scaling operation, having been (previously) set to the 
actual size of the character shape within the bit map 
memory. The values held in the size registers are 
actually one less than the size of the shape to be 
scaled, i.e. a size register value of 00011 binary (3 
decimal) indicates, to the scaling system, that the 
shape has a size- of four pixels in that particular 
direction, horizontal or vertical. The horizontal and 
vertical scaling memories (46) , (47) connected with the 
respective counters and registers (42)-(45) are 
identical and conveniently consist of 1024 by 5 bit 
static RAM's (Random Access Memories), although a read 
only form of memory could also be used. Scaling tables 
would be written to RAM's at power-up. whereas read 
only memories would have scaling tables already "burnt 
in". Each 1024 by 5 bit scaling memory has a ten bit 
25 address, made up from the five bits (each) of the, 

appropriate counter and size registers, (42), (44) and 
(43), (45). The 5 bits from the counter will count up 
from zero as the scaling is performed while the 5 bits 
from the size register remain constant so that, from 
the scaling tables in Figure 12, a sequence of pixel 
numbers may be generated. These are used as pixel 
pick-up addresses by the bit map memory that holds the 
shape being scaled. For any particular address 
presented to the scaling memory, a 5 bit data word will 
35 be available at the data out terminals. These are ' 
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referred to as the .edified x and y addresses, x and y 
corresponding to horizontal and vertical pixel 
coordinates within the shape in the bit map memory 
The two groups of modified addresses are used as a ten 
bit address for the bit map memory and the overa 
effect has been to generate an address sequence which 
has been pixel by pixel adjusted by the scaling 
memories in such a way that niv«i«, 

the bit m *n Pi ° ked U P fr ™ 

the bit map in a repeated fashion. The extent to which 

t e p^els are repeated is a function of the size o 

the shape as defined by the values in the size 

registers. If the size registers -are set to mil the n 

the shape within the bit map is already at »w, 

„ air^aay at maximum and 

the part.cular sequence of modified addresses. < product 
by the scaling tables, win be , count th „ ^ ° duCed 
identical to the output from the counter for that axis 
Referring to the scaling table <Fig. l2 , entries ' 
e.g. the horizontal li st of numbers, for SIZE = 31 ,', 

values. The sequence for SIZE = 31 is ltself a fc . 
count fro. 0 to 31. in this instance th~ ~ allns Z bl 
"ill have had no effect but for all ^ 
than am ,31 decimal, the shape «„ be scaled ~ 

scaling tables Flg. 12 are to be further described.' 

The output memory (48, ls a statlc Random ' 
Memory providing storage of l024 by , pixels. Ourlng 
the scaUng operation the ten bit address to thl. 
output memory ls drlV9n by ^ ^ coUnters ~ 

U3). horizontal and vertical, from zero to fun „ ou S L 
ones. Every pixe! is thus addressed once, with the 
Mac. or white vaiue written into the particular 
location, being derived from the pixels stored in the 
bit map. that had been addressed by the address that 
has been modified as described. 
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Referring again to Figure 10A, the variable 
scaling algorithm in relation to the operation of the 
scaling tables, may 5e represented as follows- 
variable scaling, on a given axis, is determined by th. 
maximum size of the intermediate block (Fig 10A) 
along that axis. If N is the maximum (pixel) excursion 
in the intermediate block, then the ' Y ' axis of th~ 
table (labelled Size S) equals (N-l). The 'X' axis'of 
the table (labelled P) is the Pixel Number ( p ) for the 
final pixel block, i.e. P goes from 0 to 31 , for a 
32x32 normalised pixel block. The table value as 
selected by the 'X' and 'V table coordinates,' is th- 
Pixel Number M in the intermediate block. Referring to 
Figure 10B. in conjunction with the scaling tables 
Fig. 12. if the maximum (pixel) excursion along an axis 
in the intermediate block is 25. then the Sizes equal. 
(N-l) is 24 and the pixel states (black or white) in 
the final pixel block P. are determined from' the pixel 
numbers (locations) derived from the tables. For th~ 
example shown in Fig.lOB:- the pixel state (black or 
white) at the final pixel location P=io will be the 
Pixel state at the intermediate pixel location M=7 
Similarly for P= 2 A the pixel state will be that at' the 
intermediate location M= l 9 . 

The scaled "normalised" character shape is now 
presented to the randomisation function (Fig.iOA) The 
randomisation function generates pseudo-random n-tuples 
by use of another look-up table. The requirement is 
that the normalised (32x32) pixel group block must • fa- 
mapped into a series of n-tuples such that: 

(a) The grouping of pixels (n-tuples) is selected on a 
random basis . 

(b) The selection of the pixels is such that no pixel 
appears in more than one n tuple and then only 

35 once. 
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fc) The pixel block (32x32) must be completely mapped 
i.e. every pixel must appear in the set of n-tupl-s 
This requirement for random n-tuples is dealt with 
in the referenced papers on N-tuple technology 
Referring to Figure 13A, this represents the ^2 
Pixel block mapped into 128 separate 8-tuples The 
initial relationship between the pixels selected to 
form 8-tuples is random but once this random selection 
has been chosen it remains unchanged. a look-up tabl- 
can be constructed to map, a given pixel location, to"a 
given bit number, in a given 8-tuple. For the example 
shown in Fig 13A, if the 32x32 pixel block coordinate 
are set such that coordinate 0,0 corresponds to the top 
left hand corner of the map then the mapping 
illustrated would correspond to the (part) tabl* of 
Figure 13B. Such a table could be constructed which 
would identify each bit in each 8-tuple with a 
specific Pixel location in the pixel block. Th- bit 
value would be - y or '0' to correspond with the black 
or white state of the pixel so located. 

The preceding descriptions (related to Figure 
10A) are not intended to imply that the described 
functions, of Fixed Scaling. Variable Scaling 
(Normalisation), Randomising, are separate serial 
activities. They have been so described for w <nf 
understanding. The normalise and randomise function 
HI) is such that the three functions (described) ar- 
carried out in an overlapping sequential manner so as" 
to appear as a single integrated function/ 

The look-up tables may conveniently reside in- 
either (a) non volatile memory (e.g. prom. PAL et <-) 

(b) volatile memory (e.g. RAM) , initialised by 
software on power-up of the machine 
That is the same approach to look-up tables can be made 
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as previously described for combinational transition 
functions . 

An additional advantage of this technique is that 
the various areas may ultimately be implemented as PAL 
based functions, which offer protection of the design 
against unauthorised copying, since such (PAL based) 
functions are very much more difficult to reverse 
engineer than PROM based functions. 

The final, operation is to load the N-tuple buffer 
input to the classification system (12) and to send a 
"finished" message to the computer control system (13) 
(step 116, Fig. 3) . 

The computer control system (13) now decides as to 
whether to proceed with the classification process, or 
to abort for the reasons previously stated, or to go 
into a classification sub-routine. 

The classification (normal routine) is initiated 
at step 115, Fig. 3 by the computer control system (13). 
The classification system (12) as previously mentioned 
20 is a synchronous state machine. The approach is 

similar to that already described in relation to the 
synchronous state machine for the segmentation system. 
That is combinational transition functions are used to 
define the conditions and sequence of the stat<=- 
25 machine. 

The methods of operation of random n-tuple 
classifiers are described in the reference papers on 
N-tuple technology. The "classifier" is pre-trained 
with the rangeof patterns or classes it is required to 
recognise. When an unknown pattern is entered, the 
classifier responds with a ranking list of the classes, 
i.e. of the "most like" scores relative to the training 
set. The N-tuple method (technique) is essentially a 
means of comparing the unknown pattern with the range 
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of patterns already "learnt" by the classifier, so that 
the classifier can make "most like" decisions. The top 
ranked (score) would -(normally) be selected as that 
representing the pattern, in the preferred embodiment, 
this selection would also be dependent on: 

(a) The score relative to some threshold A above which 
the character is identified (classified). . 

(b) The score relative to some threshold B below which 
the character is not identified. 

(c) The Ranking Order of the scores, i.e. the relative 
discrimination between the top ranked class and 
the next highest scoring class or classes. 

The classification system (12) is shown in more 
detail in Figure 14. The mode of operation of the 
classification system is illustrated in Figure 15. 

The classification system comprises an n-tuple 
counter (50.) and a (Class) group counter (51). These 
counters are driven by the same system clock which 
drives the scaling system previously described. The 
n-tuple and group counters (50). (51) comprise a seven 
bit counter and a three bit counter, respectively 
connected as a ten bit counter. This counter counts 
from zero to full house one"s during the response 
calculation operation. Initially the counters are s^t 
25 to zero (step 200, Figure 15). The output from the 

n-tuple counter (50) is a number which is used as an 
address for an n-tuple memory (49). The seven bits ar~ 
used to sequentially address the 128 n-tuples stored 
within that memory. 

The n-tuple memory (49) win have previously 
been loaded from the normalise and randomise system 
function (11) and will contain a random n-tuple pattern 
of bits that represent the normalised shape, that has 
been extracted from the bitmap memory (7). The n-tuple 
memory (49) consists of a static Random Access Memory 
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with a storage capacity of 128 eight bit values, these 
values being the n-tuples that make up the shape to be 
recognised (i.e. n=8 f . 

The n-tuples are addressed sequentially by the 
5 incrementing n-tuple counter (50) and the eight bit 
values of those n-tuples are presented to a 
discriminator memory (53), as addresses which are 
combined with both the seven bit output of the n-tuple 
counter (50) and the four bit output from the group 
10 counter (51) to produce a 19 bit address that is used 
by the discriminator memory (53). 

Note: It is assumed that the discriminator memory has 
previously been loaded with the responses 
generated from training data as previously 
15 described and as referenced in the papers that 

describe the operation of an N-tuple based 
recognition system. 
The discriminator memory (53) is a Random Access Memory 
constructed from Dynamic Random Access Memory elements 
20 which are organised, for the purposes of a parallel 

response discriminator, as an eight bit wide data bus 
memory system. 

During the calculation of the responses, the 
values read from the discriminator memory (53) are 
25 interpreted as -single bit responses (step 202, Fig. 15), 
these are required to be summed in order to produce 
total responses for all classes that are being tested 

for possible recognition. In order to provide these - 

•i 

summed totals, a collection of eight bit counters or 
30 incrementors (54) are connected to the data output 

terminals of the discriminator in such a fashion that 
they will increment, or count up by one, if the 
particular discriminator data bit corresponding to that 
particular value of the n-tuple is a logic one. If the 
35 discriminator provides a logic zero then the up counter 



WO 90/03012 



PCT/GB89/01043 



36 



10 



15 



20 



25 



30 



35 



will ignore it and retain its current value. All of 
the incrementors (56) are cleared to zero (step 201, 
Fig. 15) at the beginning of every group, i.e. when the 
value in the n- tuple counter goes from limn binary 
(31 decimal) to zero and the group counter increments 
by one. This initialises the incrementors (56) ready to 
produce response sum totals for the eight sub classes 
that constitute the next group. 

Before the n-tuple counter begins its incrementing 
sequence from zero to 31 decimal, a class counter (55) 
is used to read the eight bit values that have accrued 
in the response incrementors (54) (step 204, Fig i 5) 
and to write them into a table of responses stored in a 
responses memory (56) (step 205, Fig.iS). The response 
memory (56) consists of a static Random Access Memory, 
organised according to the number of classes of 
recognition (classification). 

At the completion of the classification function 
(step 206, Fig. 15) a message is sent to the computer 
control system (13), providing classification data(step 
116. Fig. 3). The computer control system (13) then 
proceeds with the initial post-processing (stage 1) ( t n 
be described) and recommences the segmentation routine 
(step 117, Fig. 3), i.e. returns to step 108, Fig. 3. 

The segmentation/classification sequence continues 
until all the characters are classified, i.e. until all 
the patterns within the image bitmap (7) have been 
extracted, segmented, normalised and classified. When 
the classification "finish" is reached (step us, 
Fig. 3), the computer control system (13) continues the 
post-processing (stage 2). 

The initial post-processing (stage 1) is to check 
for items, such as punctuation, known ambiguities, 
nonsense (based on known response values), invalid 
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classes etc. and to complete the identification of the 
character . 

The final post-processing (stage 2) is to 
reassemble the character data into a "format" with the 
5 classification "errors" as (I) below highlighted. 
Classification errors are: 

(I) Reject error, where the classifier is unable to 
make a true decision. 

(II) Substitution error, where the classifier makes a 
10 wrong decision. 

In the case of (I) above, it is possible, by known 
computing means, to arrange to output from the 
recognition unit, for subsequent display, the entire 
pixel group representing the character, this allows for 

15 a human interrogation (intervention). To allow for 
this facility the output from the stage 1 post- 
processing should be loaded into a "shape buffer" 
memory store. 

In order to ensure a correct ordering of each 

20 pattern, as it is classified the stage l post- 
processing software has to arrange to "tag" each 
result with the image map location data previously 
received (steps 107, 109, Fig. 3) ; this information may 
then be used, in the embodiment for text recognition, 

25 to recompose the page and ensure the correct ordering 
of the recognised characters, as described for stage 2 
post-processing . 

In the event that a result from the stage 1 post- 
processing is that no. one character class is 

30 sufficiently clear, the computer control system(i3) may 
decide to require that the character is reclassified, 
e.g. by presentation to a sub-set of classes as 
previously explained. It should also be noted that 
the order in which the discriminator memory (53) is 

35 accessed may be arranged in a special form, again as 
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previously described. For example, in the case of 
English language text, the first classes accessed may 
comprise the vowels, -In this case, the computer control 
system (13) may compare each response against 
5 predetermined recognition criteria and as soon as those 
criteria are satisfied will terminate further 
classification. 

Post-processing can be used to carry out other 
functions, to improve the error rate and to provide 
10 special facilities, for example: 

(a) Minimise errors due to case confusion. 

(b) Minimise errors due to alpha/numeric confusion. 
CO Allow the definition of selected fields within an 

image and select those fields only to be processed. 
15 (d) Allow selected fields to be defined as alpha or 
numeric or mixed. 

(e) Apply additional rules to pixel groups of patterns 
which are not recognised, or poorly discriminated. 

(f) Apply dictionary and/or context correction 
20 techniques to reduce errors. 

(g) Ensure the classified patterns are ordered to a 
predetermined format, as appropriate to the 
appl ication . 
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CLAIMS 

1. Image recognition apparatus comprising a first 
synchronous state machine fcr segmenting a number of 

5 images defined in bit map form into separate pixel 
groups; and a second synchronous state machine to which 
each pixel group is applied for classification. 

2. Apparatus for recognizing images represented by 
respective digital pixel groups, the apparatus comprising 

10 an N-tuple classifier including a number of 
discriminators each adapted to recognise a respective 
class of a predetermined group of classes and to which 
the pixel groups are presented, the apparatus being 
arranged to present each pixel group to the 

15 discriminators in a predetermined sequence; and 
recognition means for monitoring the output of the 
discriminators and for terminating the presentation of 
the pixel group to the classifier as soon as the output 
from a discriminator satisfies a recognition condition. 

20 3. Apparatus according to claim 1 and claim 2. 

4. A method of recognizing images represented by 
respective digital pixel groups, the method comprising 
presenting each pixel group to an N-tuple classifier 
having a number of discriminators each adapted to 

25 recognise a respective class of a predetermined group of 
classes, is characterised in that each pixel group is 
presented to the discriminators in a predetermined 
sequence; and in that as soon as the output from a 
discriminator satisfies a recognition condition, the 

30 presentation of the pixel group to the classifier is 
terminated. 

5. A method according to claim 4, comprising comparing 
the output from each discriminator with a threshold, the 
recognition condition being satisfied when the threshold 

35 is exceeded. 
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6. A method according to claim 4, wherein each pixel 
group is presented to the discriminators in the order of 
frequency of occurrence of the classes represented by the 
discriminators . 

5 7. A method according to claim 4, wherein the 
discriminator or discriminators to which each pixet group 
is applied are chosen in accordance with the location of 
the pixel group defining the images within the context of 
the previously detected images. 

10 8. A method for recognising images represented by 
respective digital pixel groups, the method comprising 
presenting each pixel group to an N- tuple classifer 
having a number of discriminators eacn adapted to 
recognise a respective class of a predetermined group of 

15 classes characterised in that if none of the 
discriminator outputs satisfies a recognition condition 
but it is determined that the pixel group defines an 
image falling within a group of the classes,, the method 
further comprises presenting a portion of the pixel group 

20 to a subsidiary N-tuple classifier having a number of 
subsidiary discriminators each adapted to recognise a 
respective portion of the group of classes. 

9. A method according to claim 8, further comprising 
storing data defining the recognized class of the image 

25 represented by the pixel group. 

10. A method according to claim 8 or claim 9, wherein 
each pixel group is presented simultaneously to groups of 
two or more discriminators in the classifier and, where 
appropriate, the subsidiary classifier. 

30 11. Apparatus for recognizing images represented by 
respective digital pixel groups, the apparatus comprising 
an N-tuple classifier having a number of discriminators 
each adapted to recognise a respective class of a 
predetermined group of classes and to which each pixel 

35 group is presented; recognition means for monitoring the 



WO 90/03012 



3? 



PCI7GB89/01043 



outputs of the discriminators; and a subsidiary N-tuple 
classifier having a number of subsidiary discriminators 
each adapted to recognise a respective class of a 
predetermined group of classes defining portions of a 
5 respective group of images, the recognition means being 
adapted to present a portion of a pixel group to the 
subsidiary classifier if it is determined that the 
discriminator outputs do not satisfy a recognition 
condition but the discriminator outputs define an image 
10 falling within the group of classes. 

12. Apparatus according to claim 11, further comprising 
storage means for storing data defining the recognized 
class of the image represented by the pixel group. 

13. A method of segmenting images represented in bit map 
15 form, the method comprising scanning the bit map to 

determine the maximum extents of an image in first and 
second orthogonal directions and recording for each scan 
line in the first direction the coordinates of the 
extreme pixels of the image in the second orthogonal 
20 direction; and selecting as defining an image only those 
pixels within a rectangle defined by the previously 
determined extents and falling within the previously 
determined extreme pixel coordinates. 

14. A method according to claim 13, wherein the scanning 
25 of the bit map is carried out in a series of horizontally 

spaced, vertical scan lines and this leads to the ability 
to compensate for skew from a knowledge of the line 
spacing or pitch deduced from a histogram analysis of the 
page of text. 

3 0 15. A method according to claim 13 or claim 14, wherein 
the selecting step comprises scanning the bit map in a 
series of lines extending in a second orthogonal 
direction and spaced apart in the first orthogonal 
direction, each line having a length corresponding to the 

35 
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